1 Introduction

The Consumer Price Index (CPI) is a measure that examines the weighted average of prices of a basket of consumer goods and services. The CPI gives the government, businesses, and citizens an idea about prices changes in the economy, and can act as a guide to make informed decisions about the economy.

This study aims to build a model to forecast the Consumer Price Index for Food and Non-Alcoholic Beverages at a monthly level. From EVDS following statistics and survey results relevant to CPI are obtained as predictors:
- (USD) US Dollar (Buying)
- (EUR) Euro (Buying)
- Personal Interest Rate(TRY)(Including Real Person Overdraft Account)(Flow Data, %)
- Statement on the current financial situation of household
- Financial situation of household expectation (over the next 12 months)
- Assessment on spending money on semi-durable goods This (over the next 3 months compared to the past 3 months)
- Probability of buying durable goods (over the next 12 months compared to the past 12 months)

Using those predictors and their correlation analysis, a regressive model will be built. Then, the model will be analyzed by other parameters, such as significance and residuals. The model will be improved by trial and errors and decided to provide a 2020-12 CPI forecast.

2 Manipulation & Visualization

2.1 Data Manipulation

#Reads data. Read first 108 rows since rest is appendix and explanation section
df <- read.xlsx("data_hw3.xlsx", sheet = 1,
  startRow = 1,
  colNames = TRUE,
  rowNames = FALSE,
  detectDates = TRUE,
  skipEmptyRows = TRUE,
  skipEmptyCols = TRUE,
  rows = c(1:109),
  cols = NULL,
  check.names = FALSE,
  sep.names = ".",
)

#Adds data.table class to data frame and checks
DT <- data.table(df)
print("Classes of the DT object")
## [1] "Classes of the DT object"
class(DT)
## [1] "data.table" "data.frame"
#Sets column names 
setnames(DT, c("date","us_dollar","euro", "interest_rate", "cpi", "current_financial_situation", "expected_financial_situation", "semi_durable_spending", "durable_spending" ) )




print("Head of the DT object")
## [1] "Head of the DT object"
head(DT)
##       date us_dollar     euro interest_rate    cpi current_financial_situation
## 1: 2012-01  1.838936 2.372327            NA 214.65                    80.80097
## 2: 2012-02  1.751090 2.314605            NA 219.12                    80.61122
## 3: 2012-03  1.779295 2.350436            NA 218.39                    78.18104
## 4: 2012-04  1.779840 2.343220            NA 218.12                    74.95274
## 5: 2012-05  1.796955 2.305236            NA 212.89                    80.53289
## 6: 2012-06  1.816071 2.274657            NA 205.61                    80.48460
##    expected_financial_situation semi_durable_spending durable_spending
## 1:                     92.08296              102.5660         93.34646
## 2:                     92.47039              105.5197         95.34939
## 3:                     91.89205              107.1729         95.33989
## 4:                     87.92141              103.6142         92.36463
## 5:                     89.54145              106.5546         92.78870
## 6:                     90.39104              107.2794         94.38445
#print("Tail of the DT object")
#tail(DT)

#There are some NA values in the interest_rate column. I will fill NA values with fist non-NA value
DT[is.na(DT)] <- 39.62


#Formats date as date format. Day is needed in order to convert so I added 01 to every month as day.
DT[, date := as.Date(paste(date, "01", sep = "-")), ]

print("Head of the formatted DT object")
## [1] "Head of the formatted DT object"
head(DT)
##          date us_dollar     euro interest_rate    cpi
## 1: 2012-01-01  1.838936 2.372327         39.62 214.65
## 2: 2012-02-01  1.751090 2.314605         39.62 219.12
## 3: 2012-03-01  1.779295 2.350436         39.62 218.39
## 4: 2012-04-01  1.779840 2.343220         39.62 218.12
## 5: 2012-05-01  1.796955 2.305236         39.62 212.89
## 6: 2012-06-01  1.816071 2.274657         39.62 205.61
##    current_financial_situation expected_financial_situation
## 1:                    80.80097                     92.08296
## 2:                    80.61122                     92.47039
## 3:                    78.18104                     91.89205
## 4:                    74.95274                     87.92141
## 5:                    80.53289                     89.54145
## 6:                    80.48460                     90.39104
##    semi_durable_spending durable_spending
## 1:              102.5660         93.34646
## 2:              105.5197         95.34939
## 3:              107.1729         95.33989
## 4:              103.6142         92.36463
## 5:              106.5546         92.78870
## 6:              107.2794         94.38445
#print("Tail of the DT object")
#tail(DT)

#Convert data into Time-Series Object
TS <-ts(DT,start = c(2012, 1),frequency=12)
print("Classes of the TS object")
## [1] "Classes of the TS object"
class(TS)
## [1] "mts"    "ts"     "matrix"

2.2 Data Visualization

#Visual representation of the data as line plots
fig <- plot_ly(
  type = "scatter",
  x = DT$date, 
  y = DT$cpi,
  name = 'Consumer Price',
  mode = "lines",
  line = list(
        color = '#ff7c43'
  )) 

fig <- fig %>%
  layout(
    title = "CPI FOR FOOD AND NON-ALCOHOLIC BEVERAGES 2012-2020",
      hovermode = "x unified",
    legend = list(x = 0.1, y = 0.9),
      yaxis = list(
          title = "TRY"
    ),
    xaxis = list(
        type = 'date',
        tickformat = "%Y-%m",
        title="DATE"
  ))

fig
fig <- plot_ly(
  type = "scatter",
  x = DT$date, 
  y = DT$euro,
  name = 'EUR €',
  mode = "lines",
  line = list(
        color = '#ff7c43'
  )) 
fig <- fig %>%
  add_trace(
    type = "scatter",
  x = DT$date, 
  y = DT$us_dollar,
    name = 'USD $',
    mode = "lines",
    line = list(
        color = '#2f4b7c'
  )) 
fig <- fig %>%
  layout(
    title = "CURRENCY EXCHAGE RATES 2012-2020",
      hovermode = "x unified",
    legend = list(x = 0.1, y = 0.9),
      yaxis = list(
          title = "TRY"
    ),
    xaxis = list(
        type = 'date',
        tickformat = "%Y-%m",
        title="DATE"
  ))

fig
fig <- plot_ly(
  type = "scatter",
  x = DT$date, 
  y = DT$current_financial_situation,
  name = 'Current Financial Situation',
  mode = "lines",
  line = list(
        color = '#ff7c43'
  )) 
fig <- fig %>%
  add_trace(
    type = "scatter",
  x = DT$date, 
  y = DT$expected_financial_situation,
    name = 'Expected Financial Situation',
    mode = "lines",
    line = list(
        color = '#2f4b7c'
  )) 
fig <- fig %>%
  add_trace(
    type = "scatter",
  x = DT$date, 
  y = DT$semi_durable_spending,
    name = 'Semi-durable Goods Spending ',
    mode = "lines",
    line = list(
        color = '#665191'
  )) 
fig <- fig %>%
  add_trace(
    type = "scatter",
  x = DT$date, 
  y = DT$durable_spending,
    name = 'Durable Goods Spending',
    mode = "lines",
    line = list(
        color = '#f95d6a'
  )) 
fig <- fig %>%
  layout(
    title = "SURVEYS ON FINANCIAL SITUATION AND SPENDING 2012-2020",
    legend = list(x = 100, y = 0.5),
    yaxis = list(
          title = "SURVEY SCORE"
    ),
    xaxis = list(
        type = 'date',
        tickformat = "%Y-%m",
        title="DATE"
      ))

fig
fig <- plot_ly(
  type = "scatter",
  x = DT$date, 
  y = DT$interest_rate,
  name = 'Personal (TRY) INTEREST RATES',
  mode = "lines",
  line = list(
        color = '#ff7c43'
  )) 

fig <- fig %>%
  layout(
    title = "Personal (TRY) INTEREST RATES",
      hovermode = "x unified",
    legend = list(x = 0.1, y = 0.9),
      yaxis = list(
          title = "PERCENTAGE"
    ),
    xaxis = list(
        type = 'date',
        tickformat = "%Y-%m",
        title="DATE"
  ))

fig
#Remove Date Column and Plot Time Series-Object
TS2 <- TS[,-1]
TS2 <- zoo(TS2)
colnames(TS2)<-c("USD","EUR","Int. Rate","CPI","Cur. Fin. Sit.","Exp. Fin. Sit","Semi-Durable","Durable")
plot(TS2)

plot_list <- vector(mode = "list", length = 8)

plot_list[[1]] <- ggplot( data= DT, aes(x=date, y=cpi) ) +
  theme_classic()+
  theme(plot.title = element_text(size = 6, face = "bold",hjust = 0.5))+
  labs( x="Date",y="CPI",title=("CPI"))+
  geom_line(color="#2f4b7c")+
  geom_smooth(method = "lm", color="#003f5c")

plot_list[[2]] <- ggplot( data= DT, aes(x=date, y=interest_rate) ) +
  theme_classic()+
  theme(plot.title = element_text(size = 6, face = "bold",hjust = 0.5))+
  labs( x="Date",y="PERCENTAGE",title=("INTEREST RATES"))+
  geom_line(color="#2f4b7c")+
  geom_smooth(method = "lm", color="#003f5c")

plot_list[[3]] <- ggplot( data= DT, aes(x=date, y=us_dollar)) +
  theme_classic()+
  theme(plot.title = element_text(size = 6, face = "bold",hjust = 0.5))+
  labs( x="Date",y="TRY",title=("USD EXC. RATE"))+
  geom_line( color="#2f4b7c")+
  geom_smooth(method = "lm", color="#003f5c")

plot_list[[4]] <- ggplot( data= DT, aes(x=date, y=euro)) +
  theme_classic()+
  theme(plot.title = element_text(size = 6, face = "bold",hjust = 0.5))+
  labs( x="Date",y="TRY",title=("EUR EXC. RATE"))+
  geom_line( color="#2f4b7c")+
  geom_smooth(method = "lm", color="#003f5c")

plot_list[[5]] <- ggplot( data= DT, aes(x=date, y=current_financial_situation) ) +
  theme_classic()+
  theme(plot.title = element_text(size = 6, face = "bold",hjust = 0.5))+
  labs( x="Date",y="SURVEY SCORE",title=("SURVEY-CURRENT FINANCIAL SITUATION"))+
  geom_line(color="#2f4b7c")+
  geom_smooth( method = "lm", color="#003f5c")

plot_list[[6]] <- ggplot( data= DT, aes(x=date, y=expected_financial_situation) ) +
  theme_classic()+
  theme(plot.title = element_text(size = 6, face = "bold",hjust = 0.5))+
  labs( x="Date",y="SURVEY SCORE",title=("SURVEY-EXPECTED FINANCIAL SITUATION"))+
  geom_line(color="#2f4b7c")+
  geom_smooth( method = "lm", color="#003f5c")

plot_list[[7]] <- ggplot( data= DT, aes(x=date, y=semi_durable_spending) ) +
  theme_classic()+
  theme(plot.title = element_text(size = 6, face = "bold",hjust = 0.5))+
  labs( x="Date",y="SURVEY SCORE",title=("SURVEY-SEMI-DURABLE SPENDING"))+
  geom_line(color="#2f4b7c")+
  geom_smooth( method = "lm", color="#003f5c")

plot_list[[8]] <- ggplot( data= DT, aes(x=date, y=durable_spending) ) +
  theme_classic()+
  theme(plot.title = element_text(size = 6, face = "bold",hjust = 0.5))+
  labs( x="Date",y="SURVEY SCORE",title=("SURVEY-DURABLE SPENDING"))+
  geom_line(color="#2f4b7c")+
  geom_smooth( method = "lm", color="#003f5c")


do.call('grid.arrange',c(plot_list, ncol = 3, top="LM SMOOTHED LINE PLOTS BETWEEN 2012-2020"))

The general trends and patterns can be seen on the all graphs above. To understand the relationship between CPI and predictors a correlation analysis can be conducted. Correlation analysis also displays the correlation between predictors and this helps to prevent multicollinearity.

2.3 Correlation Analysis

DT_COR <- setcolorder(DT[,-c(1)], c("cpi","us_dollar","euro","current_financial_situation","expected_financial_situation","semi_durable_spending","durable_spending"))
correlation_matrix <- cor(DT_COR)
#correlation_matrix

# Visualizes correlation with ggcorr(). Using lower presents a plot easier to read
correlation_plot_lower <- ggcorrplot(correlation_matrix, method = "square", type = "lower",
           show.legend = TRUE, lab = TRUE, lab_col="white", lab_size=2, digits = 2,
           title = "Lower Correlation Matrix ", colors = c("#d45087","white","#003f5c"),
           legend.title = "Correlation") + theme(plot.title=element_text(hjust=0.5), 
                                                 axis.text.x = element_text(angle = 90, vjust =0.5, hjust=0.5))
correlation_plot_lower

# Creates correlogram with ggpairs
ggpairs(DT_COR)+ theme_minimal() + labs(title ="Correlogram of parameters")

All variables except durable_spending has high enough correlation and significance. Since, food and beverages are perishable goods, no correlation between spending on durable goods and CPI makes sense.

2.4 Multivariable Regresison Modelling

A model for estimating CPI based on USD$, Current and Expected Financial Situation, Spending on Semi-durable Goods, and Interest Rate will be built following correlation analysis.

fit1<-lm(cpi~us_dollar+current_financial_situation+expected_financial_situation+semi_durable_spending+interest_rate,data=DT[date>="2012-01-01" & date<="2020-12-01"])
summary(fit1)
## 
## Call:
## lm(formula = cpi ~ us_dollar + current_financial_situation + 
##     expected_financial_situation + semi_durable_spending + interest_rate, 
##     data = DT[date >= "2012-01-01" & date <= "2020-12-01"])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.357  -8.933   0.545   9.142  41.715 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  428.9675    73.6448   5.825  6.7e-08 ***
## us_dollar                     58.0014     1.8332  31.640  < 2e-16 ***
## current_financial_situation   -1.5805     0.7883  -2.005 0.047623 *  
## expected_financial_situation  -0.1139     0.6422  -0.177 0.859576    
## semi_durable_spending         -1.2881     0.4996  -2.578 0.011356 *  
## interest_rate                 -1.1166     0.2861  -3.903 0.000171 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.58 on 102 degrees of freedom
## Multiple R-squared:  0.9795, Adjusted R-squared:  0.9785 
## F-statistic: 975.8 on 5 and 102 DF,  p-value: < 2.2e-16

The model has high R-squared this means An R-squared value close to 1 indicates that the model explains a large portion of the variance in the outcome variable. However, it will always increase when more variables are added to the model, even if those variables are only weakly associated with the response. Adjusted R-squared takes this problem into consideration. Adjusted R-squared is also close to 1 in the model.

It can be seen that p-value of the F-statistic is < 2.2e-16, which is highly significant. This means that, at least, one of the predictor variables is significantly related to the outcome variable. When coefficients inspected, changes in us_dollar and interest_rate are highly significantly associated with changes in CPI. The changes in current_financial_situation and semi_durable_spending are significantly associated with changes in CPI. The change in expected_financial_situation is not significantly associated with changes in CPI.

As the expected_financial_situation variable is not significant, it is possible to remove it from the model. A new model for estimating CPI based on USD$, Current Financial Situation, Spending on Semi-durable Goods, and Interest Rate will be built following correlation analysis.

fit2<-lm(cpi~us_dollar+current_financial_situation+semi_durable_spending+interest_rate,data=DT[date>="2012-01-01" & date<="2020-12-01"])
summary(fit2)
## 
## Call:
## lm(formula = cpi ~ us_dollar + current_financial_situation + 
##     semi_durable_spending + interest_rate, data = DT[date >= 
##     "2012-01-01" & date <= "2020-12-01"])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.872  -8.784   0.706   9.087  42.142 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 425.4837    70.6425   6.023 2.67e-08 ***
## us_dollar                    58.2256     1.3209  44.079  < 2e-16 ***
## current_financial_situation  -1.6701     0.6022  -2.774  0.00658 ** 
## semi_durable_spending        -1.2983     0.4940  -2.628  0.00989 ** 
## interest_rate                -1.0933     0.2531  -4.320 3.60e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.5 on 103 degrees of freedom
## Multiple R-squared:  0.9795, Adjusted R-squared:  0.9787 
## F-statistic:  1231 on 4 and 103 DF,  p-value: < 2.2e-16

The model equation can be written as follows for Model1:
 cpi = 426.66 + 58.08us_dollar - 1.72current_financial_situation - 1.26semi_durable_spending - 1.09interest_rate

checkresiduals (fit2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 68.521, df = 10, p-value = 8.549e-11

Linear regression assumes that the residual errors are assumed to be normally distributed. As a result of visual analysis on the residuals/count histogram, the model complies with the assumption. However, if residuals are correlated the model misses some information. High ACF value indicated positive autocorellation at lag1 and negative autocorrelation at lag7. Those lags can be used as indepented values to improve the model. Also, variance of the residuals increasing.

#Add lags to the data
DT[,lag1:=shift(cpi,type="lag",n=1)]
DT[,lag7:=shift(cpi,type="lag",n=7)]
fit3<-lm(cpi~us_dollar+current_financial_situation+semi_durable_spending+interest_rate+lag1+lag7,data=DT[date>="2012-01-01" & date<="2020-12-01"])
summary(fit3)
## 
## Call:
## lm(formula = cpi ~ us_dollar + current_financial_situation + 
##     semi_durable_spending + interest_rate + lag1 + lag7, data = DT[date >= 
##     "2012-01-01" & date <= "2020-12-01"])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5413  -4.0913  -0.3673   4.0610  23.9351 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 51.06691   34.76857   1.469    0.145    
## us_dollar                   10.34568    2.33319   4.434 2.51e-05 ***
## current_financial_situation -0.20012    0.29270  -0.684    0.496    
## semi_durable_spending       -0.14051    0.24600  -0.571    0.569    
## interest_rate               -0.05367    0.13696  -0.392    0.696    
## lag1                         0.82671    0.04598  17.980  < 2e-16 ***
## lag7                         0.01891    0.03920   0.483    0.631    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.94 on 94 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.9963, Adjusted R-squared:  0.9961 
## F-statistic:  4257 on 6 and 94 DF,  p-value: < 2.2e-16
checkresiduals (fit3)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 47.187, df = 10, p-value = 8.732e-07

Adding lag1 made many variables insignificant. The insignificant will be removed from the model.

fit4<-lm(cpi~us_dollar+lag1,data=DT[date>="2012-01-01" & date<="2020-12-01"],na.action="na.exclude") 
summary(fit4)
## 
## Call:
## lm(formula = cpi ~ us_dollar + lag1, data = DT[date >= "2012-01-01" & 
##     date <= "2020-12-01"], na.action = "na.exclude")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.2794  -4.1619   0.2138   3.6426  23.7604 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14.65280    4.26753   3.434 0.000857 ***
## us_dollar    9.55025    2.08175   4.588 1.26e-05 ***
## lag1         0.86701    0.03325  26.074  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.817 on 104 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.9964, Adjusted R-squared:  0.9964 
## F-statistic: 1.449e+04 on 2 and 104 DF,  p-value: < 2.2e-16
checkresiduals (fit4)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 48.847, df = 10, p-value = 4.344e-07

The model equation can be written as follows for Model2:
 cpi = 14.82 + 9.46us_dollar + 0.87lag1

#Add fitted values and residuals to the data for both chosen models
DT[,predicted1:=fitted(fit2)]
DT[,residuals1:=residuals(fit2)]

DT[,predicted2:=fitted(fit4)]
DT[,residuals2:=residuals(fit4)]
plot_list <- vector(mode = "list", length = 4)
plot_list[[1]] <- ggplot(data = DT )+ 
  theme_classic()+
  theme(plot.title = element_text(size = 10, face = "bold",hjust = 0.5))+
  geom_point(aes(x=predicted1,y=residuals1))+
  geom_abline(slope=1,color="red")+
  labs(x="Model1 Prediction",
       y="Residuals",
       title="Residuals vs Predicted Values for Model1")

plot_list[[2]] <- ggplot(data = DT )+ 
  theme_classic()+
  theme(plot.title = element_text(size = 10, face = "bold",hjust = 0.5))+
  geom_point(aes(x=predicted2,y=residuals2))+
  geom_abline(slope=1,color="red")+
  labs(x="Model2 Prediction",
       y="Residuals",
       title="Residuals vs Predicted Values for Model2")

plot_list[[3]] <- ggplot(data = DT )+ 
  theme_classic()+
  theme(plot.title = element_text(size = 10, face = "bold",hjust = 0.5))+
  geom_point(aes(x=predicted1,y=cpi))+
  geom_abline(slope=1,color="#d45087")+
  labs(x="Model1 Prediction",
       y="CPI",
       title="Actual vs Predicted Values for Model1")

plot_list[[4]] <- ggplot(data = DT )+ 
  theme_classic()+
  theme(plot.title = element_text(size = 10, face = "bold",hjust = 0.5))+
  geom_point(aes(x=predicted2,y=cpi))+
  geom_abline(slope=1,color="#d45087")+
  labs(x="Model2 Prediction",
       y="CPI",
       title="Actual vs Predicted Values for Model2")


do.call('grid.arrange',c(plot_list, ncol = 2, top="COMPARISON OF MODELS"))

In both models mean of residuals are close to 0 and their variances are also similar. Although both models fit the actual CPI values well, Model2 slightly fits better.

fig <- plot_ly(
  type = "scatter",
  x = DT$date, 
  y = DT$cpi,
  name = 'Actual Value',
  mode = "lines+markers",
  line = list(
        color = '#003f5c'
  )) 
fig <- fig %>%
  add_trace(
    type = "scatter",
  x = DT$date, 
  y = DT$predicted1,
    name = 'Model 1 Prediction',
    mode = "lines+markers",
    line = list(
        color = '#f95d6a',
        dash = 'dash'
  )) 
fig <- fig %>%
  add_trace(
    type = "scatter",
  x = DT$date, 
  y = DT$predicted2,
    name = 'Model 2 Prediction',
    mode = "lines+markers",

    line = list(
        color = '#ffa600',
        dash = 'dash'
  )) 
fig <- fig %>%
  layout(
    title = "BOTH MODELS AND ACTUAL CPI VALUES",
      hovermode = "x unified",
    legend = list(x = 0.1, y = 0.9),
      yaxis = list(
          title = "CPI"
    ),
    xaxis = list(
        type = 'date',
        tickformat = "%Y-%m",
        title="DATE"
  ))

fig

2.5 Prediction

lr <- DT[, .SD[.N],]    
lr[,c("cpi","predicted1","predicted2"),]
##       cpi predicted1 predicted2
## 1: 608.06   599.6219   602.5889

Actual CPI is 608.36 Model1 Prediction is 599.62 lower than actual results 8.74 Model2 Prediction is 602.59 lower than actual results 5.77 As mentioned in Comparison of Models section both models are good at predicting CPI, Model2 is slightly better.

3 Conclusion

In this study, we have obtained two regression models that predict the CPI the Consumer Price Index for Food and Non-Alcoholic Beverages at a monthly level. Both models have high explanatory power, low residual error. Both models predicted close values to the actual CPI, and the errors were 8.74 and 5.77

The first model is based on correlation and improved by significance values of the predictors. The first model consists of many initial predictors such as USD Exchange Rate, Interest Rate, Statement on the current financial situation of household, and Assessment on spending money on semi-durable goods. However, the first model had a high autocorrelation between residuals. So, lag values are added to the model, and as a result, another model has obtained.

The second model is generated with the addition of the lag variables as predictors. However, lag1 variable made any other variable then itself and USD Exchange Rate insignificant. This may be a result of interaction between USD Exchange rate and other predictors. On the other hand, the second model fitted actual values slightly better, but maybe it is overfitted.

I think more test data from 2021 is needed to decide for sure which model is better and more robust. Until 2021 statics are calculated and published on EVDS.